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Abstract 

A series of, six studies conducted in laboratory 'and classroom 
settings investigated the diagnostic and remedial performance of reading 
and learning disabilities specialists jand classroom teachers. The 
participants 1 (N=66) basic task was to diagnose simulatedTcales of 
either reading -or learning disability, and to suggest an initial remed- 
iation plan. There are two related |findings across* all studies. First, 
commonality (the extent to whic£ clinicians made the same statements 

r 

about a case) is very low; most statements in. the 'written diagnoses and 

( \ 

remediations for a given case were mentioned only once. Only 3% of the 
statements were mentioned in- half or more of the diagnoses for the same 

m *• 

* case. Secontl, individual agreement (between two clinicians on ttie same 
case and ohe clinician at two different times for the same case) was 
also Very low. Mean diagnostic agreement between twd clinicians 
remained close to zero across the six studies. Mean -diagnostic agree- 
ment results for a jingle clinician on a case across' time showed th^t * 
only .20% of the statements wefe agreed upon Both times by the same 
person. Additionally, analysis of diagnostic ^nd remedial process' in 
three of the studies revealed wide variability in total time taken to 
collect case information (cues) and in the number of cues collected. 

i 

Neither were significantly correlated with agreement. 
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DIAGNOSING CHILDREN WITH EDUCATIONAL PROBLEMS: 
CHARACTERISTICS OF READING AND LEARNING DISABILITIES SPECIALISTS 

AND CLASSROOM TEACHERS 



John F. Vinsonhaler, Annette B. Weinshank, 

Christian C." Wagner, and Ruth M. Poiin 1 - ' 

3- 

* \ ' \ > • ' " *' 1 

Diagnosis is accorded importance by nearly all authorities in the 
field of readings Diagnosis as the ^£6is for remediation is a cardinal 
principle in tjje literature and in the world of practice (Ekwall, 1976; 
Spache & Spache, 1973; Carter & McGinnis, 1970; o£to, McMenemy*, & 
Smith, 1973; Rabinovitch, 1965; Smith, 1969; Smith, Carter, & Dapper^ 
1970). Many view diagnosis as an essential and integral part of 
total reading instruction and as a basic elemertt of all' efficient 

/ ; . * . 

teaching (Otto et al. , 1973; Sheldon, 1968; Smith et al., 1970). 

Diagnosis is seen as a preliminary .step to sound instruction;^ guide 

to teachers In the planning, modification, and individualization of 

instruction (Botid, 1970; Bond & Tinker, 1967; Daufcat, 1977 ; Dietrich, 

1,972; Farr, 197X; Karlsen, 19?6; Olson & Dillner, 1976; Sawyer* 1968; 

SmJ,th r 1969; Smith et al. , 1970; ,and Swalm, 1973; • Austin, Note 1). 

While it is generally agreed^that diagnosis i$ important, there is 

less consensus on its content, h<jfw it is conducted, and the frequency 

of a useful diagnosis. / 
■ • \ 

s 

Major Orientations . 
At least ttiree major orientations toward diagnostic contei^ can be 
found in the literature. One approach concentrates on establishing the 
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child's general reading level as compared to Tiis/her reading potential 



fkuszak, 1972; "Spache, 1976), A second orientation emphasizes the 



kkls. 



examination of the child's performance on a set of reading s^i 
Some authors suggest, that the diagnosis include both strengths and 
weaknesses (Peters*, 1977; Monroe, 1968; Carter, 1970; Carter & McGinnis 
1970). A third ^roup of authors view the diagnosis as a determination 
of causality, that is, an understanding of the underlying factors that 
# have caused the reading problems. Such an understanding, they feel, 
Enables' the clinician to prescribe the *ost appropriate steps for re- 
mediation (Harris, 1972^\tr4ng, 1964; Natchez, 1968; Monroe, 19^; 

Carter & McGinnis, 1970; Harris, 1977). * > - 

* < 

* More specifically, the first method emphasizes a diagnosis con- 
ducted by the teacher in tlie* classroom and Concerns the early 'detec- 
tion of reading problems. This type involves little clinical testing 
or interaction with individual students. Classrodm diagnbsis is 
typically A group event Involving the administration of group tests 
(Carter & McGinnis, j.970; Kennedy, 197^1; Otto et al., 1973; Smith, 
et al., 1970; Wilson, 1-977); As such, A it does not require much time 
and is an informal process in which the classroom teacher can observe 
a group or individual studehts over a long period of time (Smith et 
al., 197*0; Wilson, 1977). 

f « 

The second method of diagnosis posits that reading difficulties 

' f 

of some students are too serious to be dealt with solely by the class- 

room teacher. A specialist becomes r^sponsibie for diagnosis (Smith, 

% 

et al., 1970; Wilson, 1977). Such a diagnosis focuses mainly on skill 
performance and is formal, analytical, and specific (Bond & Tinker, 
1967). ^ 
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' A third method emphasizes a diagnosis performed in a reading 

clinic. Clinical diagnosis of reading difficulties is designed to deal 

with severe cases that cannot be handled in a regular sctiool setting. 

Although* part of this diagnosis can be conducted by a school' reading 

specialist, other phases must be carried out by clinicians from various 

disciplines (psychologists, audiologists, physicians, etc). Clinical 

diagnoses are oriented mainly toward the determination of causal factors. 

They require an intensive, thorough case* stifdy of an individual child, 

including personality factors (Strang, 1969), * 

» * 

* Frequency of Diagnosis . 

* ^> ■ 

v Some authors argue that the diagnosis should be conducted on a 

, « 
regular basis before and during remediation. Others argue t^hat; diag- • 

nosis should be a continuous process^D^^sponse to changing informa- 

I 

tion about the child and his reading problems (Bond, 1970; Bond & 
Tinker, 19£7; Otto, et al.,*1973; Spache, 1976; Smith, 1969; Siirfth, 
et al., 1970; Strang, 1964). While empirical ev^ence on optimum 
frequency of reading diagnosis is scarce, research on physicians' 
decision making confirms t^he view that diagnosis occurs over time and 
is modified i^ the face of new data. Eventually, however, for most 
physicians, the diagnosis stabilizes to form the basis of an initial 
plan of therapy. Regardless of the method , c content, and frequency of 
reading diagnosis, nearly all authors agree that the diagnosis should 

form the basis for remediation. Here again, however * little empirical 

* — 

evidence exists in reading about the relationship between individual . 
^iagnc^pis and remediation. Spache (in Newman, 1969) .contended that 
there, was still widespread lack of integration between the two pro- 
cesses of diagnosis and remediation and stated, 



Numerous reports of remedial work give evidence that the 
procedures used are not directly related ho the detailed 
diagnostic findings. / • 

Bateman (Note 2) asked, ^ " i * 

Was the diagnosis a necessary and sufficient prerequisite tp 
the remediation? Might other remediation, not derived from 
the diagnosis, have been equally successful? A^child is 
diagnosed but the remediation is not successful. Was the 
diagnosis inadequate, or was an error made in deriving the 
remediation? 



Instructional Consequences, of Low Reliability 

Bateman 1 s point can be made more concrete by examining the in- 
9tructional consequences of unreliable diagnoses for the same or 
similar cases. Low reliability canNJje interpreted as a function of 
unpredictable judgments, that is, the chance assignment of children 
to diagnostic categories. First, consider the case in which children 
have a problem with word^ attack skills and a known effective remedi- 
ation is^available. In the context of low diagnostic reliability, 
some of the children' will be diagnosed as having the problem, wi^ll 
be correctly treated, and will show improvement in reading. The other 
fe^ldren in the group will be incorrectly diagnosed, will receive no 
remediation and will show an overall loss in performance in relation 
to ttfaeir classmates. 

Now consider the case of a group of children who do not have a 
word attack problem* Some will be correctly diagnosed as not having' 
word attack problems and will receive no treatment. Others will be 

incorrectly diagnosed and will spend their academic time on drifl 

t 

and practice for skills alre^jdy^mstered. 

In the examples above, the effectiveness of the remediation was 
known. Now, let us examine the impact of unreliable diagnosis- when 
the efficacy of treatment is not known and must be evaluated. An 
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evaluation study is performed by obtaining the apparent diagnosis for 
each child and applying the correct remediation in terms o£ the stated 
diagnosis f Suppose 'a group of children cannot read two- and three-"*" 
syllable words. Within tfrjuf group some children* lack £ mastery of major 
sound-symbol associations (sound-symbol problem). Others have poor V ^ 
syllabication and blending skills (syllabication problem). Further, 
suppose we ate evaluating two treatments: Treatment A is effective for 
the sound- symbol problem 'and Treatment B is effective for the syllab- 
ication problem. Assume that each treatment works primarily for one 
problem only. 

Consider tfie group that receives the apparent diagnosis of sound- 

1 s 

symbdl problem. Some of tile group will actually have this problem, 
will receive Treatment A, and will show good improvement. The others 
-will actually have the syllabication problem, will receive Treatment A 
instead of Treatment B, and will show no improvement. Overall, the 
group with the apparent diagnosis of sound-symbol pi&blem will show 
only a piodest improvement as a result of Treatment A. A similar ^ 
dilution of treatment effect will obtain for the group with the 
apparent syllabication problem. Treatment B* will not be appropriate 
for the entire group. , - 

Overall, the«.ef f ectiveness of the two treatments will be sys- 
tematically underestimated to the degree that the diagnoses are unre- 
liable. ^Obviously, reliability of 'diagnosis provides no information as . 
to its validity (one can be reliably wrong) . Reliability only permits 
the correct estimation of remedial effectiveness (Collen, Rubin, 
Neyman, Dantzig', Baer, & Siegelaub, 1964). 
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Diagnostic ; Agreement Studies "J- ' " 

Although the literature says much about the importance of 1 * *• 

: . i n > ^ 

diagnosis in teadijng, little empirical data exists. ; The few diagnostic 

agreement studies in education suggest that groups of clinicians, 

* » 4 * \ 

wbrking together, *can produce mutually agreed-upon diagnostl€^state- 
ments (Letner & Schuyler, 1973). Some studies in the medical literature 
however,- investigating the agreement^5irlndividual ph^sicians/^ medical 
judgements,, have revealed marked disagreement ^among physicians {Garland; 
1959); Paton (1957) reported an error pate of^56% in diagnosing' 
Ayoeardiai infarction blJsed on autopsy results. In the diagnosis 

.'J w 

4 Qf pulmonary .disease from x-ray ^photographs, the agreement of the 
* average phsycian^of the diagnosis was generally 80% with himself and 
70% with other radiologists (Fletther, 1952; Cochrane and Garland, 
1952-} Yerushalmy, 1955, 1969). Finally, irf'the diagnosis of various 
psychiatriy4 disorders, there may be total disagreement among diagno- 
sticians (Kendall," 1975). 

V - V 
The studies presented here derive from a pi;ograV of empirical 

research on diagnositc*problem solving in medicine (Elsteih, Shulman, 
& Sprafka, 1978). These medical studies sought to capture the diag- 
nostic methods used by highly skilled physicians who were presented 
with realistically simulated medical cases, fiie researchers concluded • 
the physicians seemed to be hypothesis directed (generated succes- 
sively more precise hypotheses of Tthe patient's medical problems) and 
tested these Hypotheses until a level of precision was reached that ^ 
was satisfactpry for treatth^t. j % 



. 7 



Understanding Diagnosticians ' r 
Problem-Solving Behavior: Study 1 

Ther^ research to be reported in this paper was directed toward the 

understanding of diagnostic problem-solving behavior of expert prac5- 

titloners in the field of education: reading specialists, learning 

disabilities specialists, and classroom teachers. The studies were 

based tipon careful observation in a controlled setting. . 

\ 

The first study documented the characteristics qf the ^interaction 
between* reading specialists and a child ^th a reading problem to 
determine (1) what information these specialists collected, OO^what) 
diagnostic categories they used, (3) what remedial actions they 
recommended, (4) how their diagnoses and remediations were related, 
and (5) how reliable .these decisions were* The- five subsequent studies 
were concerned With (1) replicating the original study, (2) examining 
the generalizability of ^he results of the initial study to other popu- 
lations (three studies) » and (3-) examining the possible effects of. 
Artifacts of the data analysis procedures, on the results (&o ^studies) . 

The purpose of the first observational study , conducted in 1977, 

was to provide insight into the interaction between reading specialists 

[ s * 

and* cases of reading difficulty. It was expected that the problem % 

solving performance of these highly trained clinicians coul4 serve^ 
as a model for the field of reading" as it had in medicine.,. The ex- 
perimental task for the individual clinicians was^to diagnose* and 
suggest remediation for simulated cases of reading difficulty. 

Use of Simulated Cases • 

fhe .use of simulated q|tses (^s opposed to using a^naturaiis.tic 
setting with real children) insured that variation in clinician 
performance was attributable to variation in clinician, not case. 
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Research in medicine allays the concern that the diagnosis of simulated 
cases is a substantively- different task from the diagnosis of real 
children. Norman and Tugwell (Note 3) support the assumption that im- 
portant problem solving 'behaviors of clinicians can be elicited through 
simulated cases. 

Each simulated' case in this* study consisted of collections of in- 
formation about a child with reading problems. The simulated cases 
were based on real children in Grades 3-7 who had attended the Michigan 
State University Reading Clinic. They were considered by staff cli- 
nicians and outside consultants to be representative of reading problems 
commonly encountered in public schools. Across all the cases, the 
representative problems include4 sight word deficiencies, inadequate 
structural and phonetic analysis skills, inadequate oral reading 
fluency, and poor comprehension. Across all the cases, information about 
the^child's achievement level, family and academic background, cogni- 
tive ability, reading ability, classroom behavior , -and so on wer? presented 
in a variety of formats including test scores, completed test book- 
lets, audio tapes, and written comments. Each simulated case was 
kept in a large file box and included an inventory of information 
(cues) . / 

Four different simulated cases were created. Each simulated 
case had a replicate, a superficially disguised version prepared by 
making minor changes in the original case (Lee & Weinshank, Note 4). 
Thi£ made a total of eight different cases and allowed for a test/retest 
design. 

V fcase 1: Stephen . 

* 

1. Initial contact information 

Age 8% Grade 3 Taped interview 

Referred by teacher for reading problems * " 
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2. Potential for reading 

Good, at grade level oi above * 
* Weschler Intelligence Scale for Children (WISC) 

CM^l I.Q.5 118 Verbal: 115 Performance: 118 

i 3. Sight wofd vocabulary 

^ ^Pdrst Grade Dolch word list: 61% 

Slosson Oral Reading Test (SORT) placement: 2.2 

4. Decoded word recognition ' 

Sefious problem indicated — Gates-McKillop Subtest "Recognizing 
and Blending Common Word Parts" shows only 6 Qf 23 nonsense 
words were read correctly 

5. Oral reading 
Inadequate fluency 

Durrell Analysis of Reading Difficulty, Oral Reading Subtest: 
* Low second grade rate 

6. Comprehension 

Listening comprehension above grade level 

Durrell Analysis of Reading Difficulty, Listening Compre- 
hension Subtest: Grade 5; Durrell Silent Reading Subtest: 
third grade 

v Case 2: Donald . 

1 • Initial contact information -7 
Age 11 Grade 6 Taped interview « L 
Referred by teacher because of difficulties With reading 
related tasks 

2. Potential for reading 
Adequate for grade level 

WISC Total I.Q.: 95 Verbal: 86 Performance 106 
Auditory acuity problem, audiological evaluation indicates 
significant hearing loss in upper frequency grange 

3. Sight word vocabulary 
Significantly below grade placement 

SORT placement: beginning 4th grade 

4. Decoded word recognition » 4 

Serious problem with decoding multisyllabic words 

Durrell Word Recognition and Analysis Subtest: Jjoth at 4th 
grade level 

5. Oral reading 
Difficulty with phrasing 

Durell Oral Reading Subtest results indicate a word by word 
reader, rate at third gfade level 

6. Compr ehens ion 

Problems in listening comprehension 

Durrell Listening Comprehension Subtest: Grade 4.5 
Problems in silent reading comprehension 
q GatesHMacGinitie Comprehension Subtest: Grade level 2.9 
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Case 3: Mike . 

1. Initial contact inforaation 

Age 12 Grade 7 Taped interview 

Referred by parents who were concerned with his progress 
in areas involving reading, writing, and spelling. 

2. Potential for ^reading 

Good for grade level and above 

WlSC Total I.Q.: 105 Verbal: 111 Performance: 98 

3. Sight word vocabulary 

Reasonably intact sight vocabulary for grade level 
Durrell Word Recognition: High 6th grade 
SORT grade equivalent: 6.8 

4. Decoded word recognition , , ^ 
Adequate to grade level 

Gates-McKillop Recognizing and Blending Common Word Parts: 
20 of 23 nonsense words ifead correctly 
Durrell Word Analysis: mid s^cth grade level 
Inadequate higher level decoding skills 

Gates-McKillop Syllabication Subtest:' grade equivalent of 
4.0 

5. Oral reading 

Serious problems witlj, fluency ^ * 
^Durrell Oral Reading Subtest: rate is high fourth grade 
equivalent ^ x 

Gates-MacGinitie Speed & Accuracy Subtest: 5.2 grade equiv 
alent 

6. Compr ehens ion 
Listening comprehension at gr^de level 

Durrell Listening Comprehension Subtest: 6th grade 
Silent reading comprehension bellow graOTt placement 

Gates-MacGinitie Comprehension Subtest: 5.5 gijade equiv- 
alent 

» 

Case 4t^ Dan . 

1. Initial contact information 
Age 9 Grade 4 yTaped interview 

* Referred by teacher and parents concerned about' Ibis basic 
reading skills and his- lack of progress with reading related 
subjects. 

2. Potential fdr reading 
Adequate for grade level reading and above 

WISC Total I.Q.: 101 Verbal: 103 Performance: 102 



3. Sight word vocabulary 

Significantly below grade placement 
Dolch List: 71% on List 2 
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* ! % • 

4. Decoded word recognition ^ 

Severe problem with learning and Application of decoding 
skills m \ « 

Durrell Word Analysis Subtest:" Low first' grade 
Durrell Visual Memory of Wqrds Subtest: 1.5 grade equiv- 
alent 

5. Oral reading 

Serious problem with rate 

Durrell Oral Reading ^Subtest: rate compairajpJLe to year-end 
second grader \ 

0 

6. Comprehension , « 
Listening compf ehension above grade placement 

Durrell Listening Comprehension Subtest: 5th grade level 
Silent reading comprehension seriously depressed 

Iowa Test of Basic Skills Comprehension Subtest: grade 
1.8 equivalency ^ ' 

The four case 'abstracts describe only part of the information 
available for eac^ase. A complete listing of information for one of 
the cases is* presented in Table 1. 

Case replicates were pjepared for all four simulated cases 
described above. i^The replicates were superficially disguised versions 
of Cases 1-4, prepared by making minor changes in each original case- 
changing names, using alternate forms of tests,* re-recording tapes 
of oral reading, and so on. 

The Study P articipants > „ 

— i • 

Participants were recuited from the most jphior and most effec- 
tive practicing clinicians in the mid-Michigan area. Recommendations 
were solicited from university faculty Jhd/or schopl administrators. 

4 ■ 

The candidates, a set of eight Repeatedly recommended Clinicians, were 
selected. All subjects had master's or doctoral degrees in reading 
and had been practicing as reading specialists for at least five yea^s. 
They had received their training in various eastern and midwest^rn 
universities. All were paid at professional rates for their partici- 
pation. 
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"Table 1 
Case 4: Dan 
Cue Inventory 



Information 




Physical 
Vision v 

Audiometry record 



X 
X 



Background 
School record 



ERLC 



xeacner rorm 




X 






ocnooi inrormatlon f , 1 




X 






x dLcUL L\j Llll «, « 




v 
A 


f 




Assessment 








e 


DablC blgnt VOCaDUlary v.uo.xcn X1SI1/ 


v 

A 


v 
A 


v 
A 


v 
A 


oensence compxecion 




X 


X 


X 


Keaaxng Diagnostic tests Abates— McKillop,; *^ 










-Recognition and blending common 


X 


X 


X 


X 


wo Ed parts 










—Auditory blending 


X 


X 




X 


-Giving letter sounds 


X 


X 


X 


X 


Auditory discrimination (Wepman) 


X 


X 




X 


Durrell listening/reading series, 










intermediate level 










-Vocabulary 


X 


X 




X 


-Paragraphs 


X 


x ■ 




X 


Diagnostic analysis of reading 










difficulty (Durrell) 








■ V 


-Oral 


X 


X 


X 


X 


-Silent 


X . 


x 


v x 


X 


-Listening comprehension 


X 


X 




X 


-Word recognition and word analysis 


X 


X 


X 


X 


-Hearing sounds in words — primary i 


X 


X 




X 


-Visual memory of words—primary 




X 




X 


-Intermediate spelling — List 1 


X 


X * 




X 


-Phonic spelling of words 


X 


X 


¥ 


x* 


Achievement test (iQwa Test of Basic Skills) 










-Vocabulary 


X 


X 




X 


-Reading 


X 


X 




X 


Graded word list {Slosson Oral Reading Test) 


x 


X 




X 


Reading achievement (Gates-MacGinitie) 










-Speed accuracy 


X 


X 




' X 


Cognitive ability (Wechsler Intelligence 










Scale, for Children) 










-Verbal k 


X 






X 


-Performance 


X 






-~.x 


-Full scale 


17 


X 
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Design 

"5 Each clinician participated in three experimental sessions over a 
three-week period. Across the twenty-four sessions, each ca^e/replicate 
was examined six times. Clinicians were randomly assigned to cases 
within the constraints of test/retest design and counterbalancing 
(See fable 2) . 



Table 2 ' 
* Design of the Study 

^ 



Clinician 



a 



Session 


1 




3 




Case 


\ 


1 


4 


1R 


2 


3 . 


2R 


3 


1 


3R 


4 


2 


4R. 


4R 


2 


. 4 


3R 


. 1 .' 


3 


2R 


3 


2 


1R 


4 


1 



B 
C 

* D \ 
E 
F 

a . • h - a 

R = replicate Case 

Procedures J # 

s 

Each session took place in a small room with a one-way mirror and 
consisted of an observation and a debriefing. Three people wei>e 
present: the subject, an experimenter, and an observer who was also 
trained in reading. The experimental task for the subject was the diag- 
nosis and remediation of a simulated case. 

The observation. No time limit was imposed. The experimenter and 
subject sat near the one-way "mirror , the observer sat On the other side 
of it. The experimenter began by helping the subject practlceSjhe 
experimental procedures using a simulated case different from the one 
to be used during the actual, session. The iession proceeded with the 
presentation of referral information - and continued with the subject 
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requesting one piece of information at a time from the cue inventory. 
The experimenter would locate the information in the file box and 
present it to the subject. When the subject had ,cc>llected as much • 
information as desired, s/he was asked to write a diagnosis and an 
initial remedial rpTaii. During consideration of the case, the subject 
was asked to verbalize Tiis/her thinking, provided Aat doing so did not 
interfere with performance. The subject was encouraged to keep notes 
and proceed with 'his/her normal methods for diagnosing a case. Mean- 
while, the observer on the other side of the mirror recorded on a 
standard observation form the information that was requested and the 
comments "that were made by the subject. 

0 " The debriefing . The observer joined the subject and the exgeri- 
menter. The. three participants then reviewed t*he record of the sub- 
ject's t>erfonfiarice in the first part of the session! The observer * 

• . t 

reviewed with "the subject each step of the interaction with the case, 

i 

starting with the very first cue request and proceeding through the 
writing of the diagnosis. A set of three questions guided the debriefing 
^for each cue: Why did you ask for this piece of information? Wha$ 
dfd it tell you? Did you have an^hunches that were confirmed or ruled 
out, or was the information irrelevant?" 1 The observer was free to ask 
the subject to expand on any statement that the observer believed to be 
significant. The subjects' comments were recorded on a standard de- 
briefing form. The intend Was to reconstruct the clinician's thinking: 
Why were particular cues requested? How were specific cues interpreted? 
What hypotheses wejre generated by specific cues? Which cues confirmed 
or 'discontinued existing hypotheses? 

Following the debriefing session/ the subject had the opportunity 
to revise the written diagnosis in tte event that the debriefing 
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session had altered his/her thinking about the case. Three products 
for the entire experimental session were (1) the standard- observation 
fo^m, which included cues collected, times of cue -requests, and 
^bserver comments; (2) the standard debriefing form; and (3) the sub- 
ject's written' diagnosis and remediation ^ including any additions made 
as a result of debriefing. For a more detailed description of the 
procedures for this study see Lee and Weinshank (Note 4).* 

Data Analysis 

The clinicians 1 statements in the written diagnoses, tor each 
case were analyzed at two levels. First, the frequency of each diag- 
nostic statement made across all sessions for a given case was tabulated 
(diagnostic commonality statistic). The proportion of sessions in 
which a statement *was mad,e provided an index of commonality for that 
statement. Second, t:he relationship between each pair of diagnoses 
was computed (diagnostic agreement statistics). The mean- Agreement 
statistics provided a measure of individual agreement for that case: 
interclinician agreement (between two clinicians on the same case) 
and intraclinician agreement (one clinician at two different times for 
the same case) . 1 

Diagnoses were compared in the following way. The natural 
language statements in each diagnosis were translated into a standard 
tocabtilary (see examples in Table 3), established by project reading 

clinicians who sorted the diagnostic statements made in all sessions 

v? * ■ 

into equivalence classes. The more than two-thousand sj^arate diag- 
nostic statements made across al^ cases vrere grouped into 162 labeled 
classes. The interrater reliability was estimated by randbmly reclass- 
ifying 10% of the 2,000 statements a second time: The result was a * 
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* Table 3 

A Portion of thfc=9tandardized Diagnostic Categories 



i6 



Number 



Category' 



a *■ 



3 
36 
39 
50 
54 
60 
64 
65 
71 
72 

92 
99 

126 
155 
158 



Normal interests 'and behavior 

» 

At least average reading potential 
Meaning vocabulary weak 
Problem with visual memory % 
Reading not a meaningful act 
Poor oral reading 
No .comprehension problem 
Reading comprehension inadequate 
Good use of context * 



Inconsistent w use of contex 
. Re cognition 



for 



word 



Sight words low 

'Insufficient visual discrimination and 
word scan 

General statements about phonics 
General statements about language 
General home background statements 



0 

' S 

s w 
. w 
w 
w 
*s 
w 
s 

w 
w 

w 

0 
0 
0 



^^eakness; S=stre^gth; O=observation 



7 



21 
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' Table U * \ 
Conversion to a Standard Vocabulary of Three Dlajtnoa 

4 



Dla 8" Ses- Simulated Diagnostic 
^ noses C linician aion Case , Category 11 



99 

'72 

64 
60 
9 



He looks at the first letter or first few letters 
in the word. 



He guesses, marly times wildly. The context of his 
guesses do not make sense. 

.Scores indicate good comprehension. 

He doesn't read fluently. 

Has few automatic words. 



4R 



99 

92 
71 
50 
155 



Only looks at the first few letters in the word 
ignoring the middle and ttie end. 

Storehouse of sight words lpw. 

Use of context is used well by Brian. 
* 

Poor visual memory and sequential memory. 

He has a quantity of language but the quality may 
be somewhat lacking. 



92 
71 
64 
36 

126 



Needs to increase sight vocabulary. 

One notable strength is his use of context. 

i, Able to extract meaning from the code. 

He has the potential to be an average or slightly 
above average reader. 

Be has some initial consonants and blends. ^ 



Refer to categories in Table 3. 
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75% placement of the statements in the identical categories the second 

, time.* The possibility of error in equating statements — that subjects 

"might use dif fererit, words to describe the same preblem or that similar 

vocabulary might mask actual differences in meaning — was negligible. 

Our subsequent studies, in which vocabulary was controlled at the 

outset, showed this to be only a mifcor source j&^xtot (Hoffmeyer, 

Note 5; Stratoudakis, Note 6.) % 

^^The process of converting natural- language diagnostic statements 

into standardized diagnostic categories in illustrated in Table 4. 

The three diagnoses are all for the same case. The table presents 

sample diagnostic statements in both natural language and standardized 
J 

categories. Thusf "He only looks at ihe first letter or first few 
letters in the word," was assigned to Diagnostic Cat%^ory 99 (insuf- 
ficient discrimination and word scan: weakness). 

In order to determine commonality for each diagnostic statement 
for a case, a proportion was* computed for each equivalence class in 
the standai# vocabulary. The calculations of diagnostic commonality 
for three categories based on the sample diagnoses is shown in Table 5. 



Table 5 

Calculating Diagnostic Commonality^ * 



Category Diagnosis^ 



92 Si ght^ words low (W) I II 

71 Good use of context (S) All 
60 Poor oral, reading (W) I A • A 

commonality statistic is calculated by dividing 
the number of times a category is^' included in a set 
of diagnoses by the total number of diagnoses in the 
set: C92=3/3=1.00, C71=2/3=.67, C60=l/3=.33. 

b W=weakness; S=strength; O=observation. 
c 

I indicates presence and A indicates absence from a 
diagnostic category. 

^83 ' -# 
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As the table shows, the presence o)f absence of each diagnostic 
category is tabulated for each diagnosis. For example, Category 71 
(good use of context) is absent from Diagnosis 1, but present in Diag- 
noses 2 and 3. The actual commonality statistic ^Ls calculated at the 
bottom of Table 4 (Footnote a). Note that these are only examples; 
the actual diagnoses contained many more statements. Further, each 
diagnostic commonality was calculated *on the basis of six diagnoses, 
not three. 

m The commonality ^statistic gives no information about the extent 
of agreement between any two particular diagnoses for a given case. 
For this we used the agreement statistic. An agreement matrix would 
first list categories (by % number of .category) present in or absent 
from the diagnoses (numbers in upper part of each box) and the frequen 
cy. Below the matrix are the calculations for diagnostic agreement. 

Table 6 

Process for Determining Presence 
or Absence of Diagnostic Agreement 

Diagnosis 1 



Diagnosis £ 



+ 


Categories 


92, 99 


155, 71, 50 


. N(+,+)=2 


. N(+,-)=53 


72, 64, 60 


3, 36, 39, 54, 65, 




. 126, 158 


N(-,+)=3 


N(-,-)=7 



A+B = 5 



9 

g+D = 10 



A+C = 5 B+D = 10 ' * V, 



Note.+=present 
-=absent 

. N=frequency of categories included in or absent from two diagnoses 



^*tf±= (AxD)-(BxC) ; Pof0er= 



V (A+B) (C+D) (A+C) (B+D) A+B+c 

Phi(l,2) = (2x7)-(3x3) Pd,2)= 2 

V (5x10) (5x10) 2+3+3 

24 



-The frequencies ,are used to calculate Phi and Porter Coefficients. 

- The Porter Coefficient (bounk§<M>y~-^and 1) is. easily interpreted. 
It is the number of diagnostiq categories present in both diagnoses (A) 
divided by the number of categories present ,in either or both diag- 
nosis (A+B+C). As the table shows, the Porter Coefficient for sample 
Diagnoses 1 and 2 is .25: Two out of eight diagnostic statements were 
r agreed upon; t 

The Phi Coefficient is equivalent to fhe Pearson Product Moment 
Correlation when all scores are £ero or one. Interpretation of the 
coefficient is usually similar to that of the Pearson: Zero indicates 
no relationship; 'one indicates perfect relationship. The baseline for 
interpreting the Phi in our work has been a study by Barrows, Feightner, 
Neufeld, & Norman who presented the same cases to 60 different phy- 
sicians. The average Phi Coefficient for diagnoses^of^the same case 
was approximately .40. Since these diagnoses were based upon histories 
and ^hygTSjil exams with no verifying laboratory information available,* 
they might be considered analogous to the type of information used by 
our subjects. Therefore, it could be argued that agreement of less 
than .40 would indicate a less than satisfactory state of affairs for 
the reading profession. 

Methodological problems 'exist with the use of the Phi^, since 
unequal marginal frequencies place bounds on the range of the statistic. 
Furthermore, one cannot assert that all entries in the "D" cell are the 
result of conscious decisions during both sessions to omit a diagnostic 
statement. One' cannot clearly say whether^ the omission of a statement 
^§ the result of deciding to leave out that statement or never having 
considered it in the first place. BasfcS on subsequent analysis of 
process, we have concluded tfiat most of the entries in the fl D fl cell # 



represented diagnostic statement's *no£ considered by either clinician. 

alV • & . " , - " * 

There|^ , the \W cell artifically inflates. the correlation. THe Porter 
Statistic avoids the* problem of the inflated^ "iff cell by including "only * 
ttte a£||gpents actually made by one or bbtlJ clinicians. 

Agr^pent on information collected by each subject about the" 



case was m|psured using the same procedures^as the diagnoses. In 

addition, diagnostic processes such as hypothesis generation and tinte- 

•III - - 
related mefgtares were examined. 

lit • r 

V < ■ • 

Clinicians T fftiagnoses * , 

Each o^ the eight clinicians ^prepared a written diagnosis and 

remedial p]^^^ three y cases, yielding a total of 24~d±-^gnoses and * 

, accompany ing^remediat ions. FQllowing is a representative, complete 

diagnosis far Simulated Case Mike atfd a sample of the r^aw data from 

which we generated our results: 

Mike, a v 12-year-old seventh grader with the capacity, 
family experiences, and background to perform at or 
above grade level in language-related subjects, scores 
substantially below le&el on standardized arid objective- 
■ based tests. * ' 

i Several factors coul£ have affected his ability to develop 
encoding and decoding skills: a speech problem that 
lingered into school years, farsightedness (in copying and , 
reading from the board), partial auditory acuity problem, 
young kindergartener (sic).' 

Mike^j^strengths are that listening skills appear to be 
close to his asfeigned grade (the test did not allow a 
ceiling for M^ke^s .capacity). Mike uses his background 
and experience to make sense of reading as observed in passage- 
independent sentences of the Durrell pral Reading Test. * 
.(i.e., Where does. Henry go in the summer? — A camp. How 
far did water come in? — Pretty fast). Mike attempts ^fco 
make sense. from the book. * * 

Mike's weaknesses are inappropriate phrasing-fluency. 
He ignores punctuation. 

Reversals of letters otcur both when h& hears the sound and 
encoding and when he sees the symbol and decodes. 

26 • 



He deletes sounds when decoding, deletes symbols when 
encoding. Blends are identified correctly when he ffiad's 
the blend symbol after hearing a word read. 

Blends are incorrectly decoded — a vowel inserted or. sub- 
stitution of letters or one letter is ignored. > 

Related language skills are speech problems. Some speech 
problems with r and ending sounds were heard from inter- 
view. 

Mike's handwriting wafe a combination of manuscript and 
cursive. The letters did not descend and the ascending 
letter barely ascended^ The letters were poorly formed. 
Spacing was inappropriate. 

Like all the others, this diagnosis is narrative in style and 
consists of three types of statements: strengths (chij.d character- 
istics seen as helpful for reading), weaknesses (characteristics seen, 
as being problematic for reading), and observations (statements that 
are either neutral or not clearly statements of strength or weakness). 

Commonality 

Litt}6 can be deduced from a single diagnosis. A better view 

of the common contents of the diagnoses is seen in the commonality 
statistics for the 162 standardized categories (Table -7) . Table 7 
includes only those categories mentioned in, at least half the diagnoses 
for a given case. Only 16 categories met this minimum level of com- 
monality. Categories^most agreed upon were related primarily to 
reading potential, poor sight words, poor oral reading,^ poor word 
analysis skills, and poor attitude. 

Mean diagnostic commonality provides an overall statistic repre- 
senting the commonality of an entire study. It is obtained by 
averaging the commonality statistic across all cases and diagnostic 
categories. The maximum possible mean commonality is one (1). This 
me^n value is obtained only when all diagnostic categories are 
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mentioned by all clinicians diagnosing the same case for all easel 
*inv6lved in the given study, that is, when all diagnoses f or /eacH^ case 
are identical. The minimum value of the mean commonality statistic is 1 
divided by the number of diagnoses obtained for each case. For 
example, in the present study six diagnoses were obtained for each case. 
Hence, the minimum mean commonality value is 1 divided by 6, or .17. 
The minimum value is obtained when there is no agreement and all diag- 
noses are completely unique. The mean commonality obtained for this 
study is .26. Since this value is only .09 above the minimum, it can be 
seen that the extent of agreement among diagnoses is very low. Sixty 
percent of the standardized categories were mentioned only once. 
Fewer than 3% of ^he standardized categories were mentioned in half or 
more of the sessions. 

• •• 

Table 7 





Diagnostic Categories Mentioned 
Most Frequently Across All Four Cases 




■ * 




Cat^&ory 




Case 








1,1R 


2,2R 


3,3R 


4,4R 


92 


Sight words low (W) 


.33 


.00 


.83 


.00 


81 


Phonics weak (W) 


.33 


.00 


.00 


.67 


36 


At least average 
reading potential (S) 


.67 


.33 , . 


.50 


.67 


60 


Poor oral reading (W) 


.50 

".33 


.67 


.33 


.00 


44 , 


Adequate verbal 
skills (S) 


.50 


.50 


.00 


86 


Problem with vowels 
vowels (W) 


.50 


.33 


.33 





9 
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Table 7 continued 



Category 



Case 



1,1R 2,2R 3,3R 4,4R 



21 Auditory acuity problem (w).OO .50 .00' .67 

25 Attitude toward reading 

poor (W) .50 .00 .00 .00 

84 No problem with isolated 

letter sound skills (S) .50 .00 .00 .00 

, 11 Spejech problem (W) .t)0 .50 .00 .00 

106 Problem with' syllables (W) .00 .50 .00 .00 

115 Handwriting problem (W) .00 .50 .00 .00 

50 Problem with visual 

memory (W) .00 .00 . a .50 .00 

76 Poor word analysis (W) .00 .00 .00 .50 

f 

109 Auditory discrimination 

problem (W) - .00 .00 .00 .50 



Note . Statements listed are those mentioned in 50% or more of the 
diagnoses for a single simulated case. 




r 
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Diagnostic Agreement 

\ The analysis for diagnostic agreement showed that, on the average, 
two different clinicians agreed on only about 10% of the categories. 
When the diagnostic statements across two cases (case/replicate) for 
a single clinician were compared, on the average, about 20 percent 
of the categories- mentioned by th« clinician the first time s/he 
diagnosed a case were repeated when the replicate was diagnosed. The 
data indicate that written diagnoses across and. within clinicians for 
the same case are extremely unreliable (Table 8), 

Table 8 

Reliability of Diagnoses and Cues 




Cue Collection Commonality 

The major categories of cues most t commonly collected were those 
that provided information about reading potential, oral reading, • 
silent reading^comprehension, listening comprehension, word recog- 
nition and word analysis, and home/school background information. 
Forty percent of the cues were collected once versus 60 percent of the 
diagnostic statements mentioned only once. Further, 30 percent of the 
cues were collected in half or more of the sessions (versus only 5 
percent of the diagnostic statements being mentioned in half or more 
of the sessions). 

30 
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Reading clinicians show a higher level of agreement on what data 
should be collected during -the case work-up (cues) tharv they do in 
stating the diagnosis based upon such data (Table 8). On the average, 
any two clinicians agreed on 30 percent of the cues collected by both, 
in contrast to 10 percent agreement for diagnostic categories. On - 
the average, the same clinician diagnosing the same case at two dif- 
ferent times agrees on 43 percent of the cues collected in both 
sessions, in contrast to 23 percent for diagnostic statements. 

The unexpectedly low diagnostic agreement in this study was 
startling, particularly s»ince .the clinicians who participated were 
highly trained (all but two had doctoral degrees) and had an average 
of ten yp^rs experience in their field. These findings raised three 
questions. First, was the low reliability a valid generalizable 
finding (i.e., not due to sampling errpr)? Second, would other pro- 
fessionals involved in the diagnosis" and treatment of children with 
reading difficulties perform similar lyf. Third, did the statistics 
that indicated low agreement really reflect unreliable decision making, 
or was reliable performance being masked by artifacts of procedure 
or data analysis? Five furthet studies were carried out to address these 
questions. 

Generalizability Studies 
The five studies have been organized into two categories: those 
that focus on generalizing across populations and those n that focus on 
modifying procedures and data analysis. This section focuses on the 
results of three studies that tested the generalizability to other 
populations of the finding of low diagnostic agreement for reading 
specialists. Each study description Includes (l)vthe specific • 

ib- 
purposes of the study, (2) the population, (3) any design or procedural 

differences from the first, observational study, and (4) results. 
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Learning Disability Study 

A learning disability study was designed to (1) verify the initial 
observational study results with a different sample of reading spec- 

c > 

ialists and new cases, (2) initially examine th^ interclinician agree- 
ment for learning disability specialists, and (3) compare the perfor- 
mance of these two groups of practitioners across cases in both f ields 
(Van Roekel, Note 7) . 

Additional materials were preparg^ to accommodate the differing 
diagnostic training received by learning disabilities specialists. 
Since none of the existing cases contained the type of information 
necessary to complete a learning disabilities -diagnosis, two simulated 
cases were developed for this study. The learning disability case 
was based on a real child with a learning disability; the reading case 
was one of the original reading cases modified to include learning 
disability measures that indicated no problem. Ten reading and ten 
learning disabilities clinicians each diagnosed both the reading and 
the learning disability cases. The subjects were "randomly selected 
volunteers chosen from a list compiled by local school districts 
of qualified specialists. No fntraclinician agreement measures 
, were possible, since there was no test-retest design. The procedures 
for each session were identical to those in the 1977 observational * 
study except that a 60-nninute time limit on information collection 
ja*s imposed for the first part of the session. The decision to limit 
the sessions to 00 minutes was based upon observations in the 
. earlier study. First, most clinicians finished within one hour, and 
second, although some took longer than one hour, there was no 
correlation between time and diagnostic agreement. Subsequent studies 
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established that imposition of a time limit resulted in diagnostic 
agreements that were comparable to and sometimes higher than those 
in the original study. 

,t • 

The results of the learning disability study paralleled those 
of the 1977 study. Mean diagnostic commonality was .09 (S.D.=.04). 
In this study, the minimum commonality was .05 since 20 diagnoses were 
obtained for each case. The obtained value is only slightly higher' 
than the minimum value. Despite lengthy individual diagnostic write- 
ups, few statements had a commonality of .5 or better (i.e., had been 
mentioned in at least half th^e diagnoses for a case). For the learning 
disability .case, the only statements that were- highly agreed upon 
were (1) weakness in gross/fine coordination and (2) problem with 
visual perception/discrimination/memory/motor skills. For the 
reading case, the most agreed upon statements were average intellec- 
tual potential, problem with attitude/interests, weak phonic analysis 
skills, and observations about contextual reading ability. 

Interclinician agreement within each group revealed no differences. 
Both groups performed at>near zero level of reliability even within 
their own area of specialization. Only 5%, of the diagnostic statements 
made about a case coul<i be agreed upon by any two clinicians, examining 
that case (Table 9). ' 

jjpk collection showed a higher level of agreement than did the 
diagnostic results. The mean commonality for cues collected was 
.17 (S.D.=.16); the mean interclinician Phi for cues was .20 (S.D.=.14). 

The diagnostic and cue agreement of the reading clinicians in 
this study very nearly duplicated the intercorrelations reported for 
the 1977 study. The researchers therefore felt that sampling error 
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Table 9 

Diagnostic Agreement of Reading (R) 
and Learning Disabilities (LD) Specialist 



LD Both 



Reading £ase 

Mean Pffi .06 . .04 .05 

rS.D. V ' k ilO .12 .11 



Learning Disability Case 
Mean Phi 
S.D. 



.01 .07 .04 

.11 .13 .13 



would be an unlikely explanation for the low diagnostic agreement of the 
reading specialists. Also, the learning disabilities clinicians were 
indistinguishable from the reading specialists in the low,, reliability 
of their diagnoses. It would appear that differences in the training 
of these two groups of professionals does not translate into differences 
in diagnostic performance. 

Classroom Teachers 

A classroom teacher's study was designed to (1) a examine the inter- 
clinician agreement of classroom teachers, and (2) compare the per- 
formance of these teachers in experimental and classroom settings 
(Gil, 1979). 

i 

Moving from the study of highly trained reading and learning 
disabilities specialists to that of classroom teachers required the 
preparation of new simulated cases to accommodate a teacher's training 
and experience. Two cases were built around materials normally 
available to the classroom teacher: background information, samples 
of oral reading (tape recordings and accompanying transcriptions), 
and comprehension performance based on this reading. The cases did 
not include formal standardized measures of reading ability. 
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The design called for ten classroom teachers of diverse background 
> \ 
(five from Michigan and five from Illinois) to (1) diagndse two dif- 

ferent^slmulated cases and (2) discuss children in fcheir oton classrooms 
who matched the diagnostic profile of the simulated .cases. \Since the ' 
teachers did not diagnose the same case twice, no intraclinic^an 
analysis could be performed. 

The procedures for each session were identical to those for\the 
1977 study with the exception of a 60-minute time limit imposed oi 
data collection in the session's first part. 

Diagnostic, commonality for the classroom^eacher study showed 
that only 6% of the total diagnostic statements were mentioned 
in three or more of the ten sessions. Mean commonality across diagnos- 
tic statements was .14 (S.D.*=.09). The minimum mean commonality was 
.1 since a to£al of ten diagnoses were obtained for each case. The 
obtained value is only .04 above the minimum value. . The diagnostic 
l statements mentioned most frequently were poor comprehension,' strength 

of major vocabulary concepts,^ sight words weak, igno^ endings, 
sight vocabulary good, phonic skills weak, problems with oral reading, 
and lacks word attack skills. 

The extent of individual (interclinician) agreement between two 
teachers on diagnostic statements was near zero (TatfT^ 10). Commonality 
for cues collected was slightly higher than for diagnostic judgments 
(mean^.l^S.D.^.aS) . Individual agreement on cues was near zero. 

These findings paralleled those of the two preceding studies:-! 
Teachers, too, exhibited extremely Ijdw levels of agreement on the same 
C ^ 6 * Xt w0uld seem fc ^ at ' t * ie trai P in 8 and experience of classroom 
teachers makes them no more reliable in the types bf diagnostic decis- 
ions they must make l:han are the reading and learning disabilities 
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specialists. , . 

r & 

The classroom component of this study was designed to assess 
whether teachers 1 diagnostic statements about real cases of reading 
disability in their classrooms were similar to their statements about 
-simulated cases in the laboratory situation. Brief summaries of the 
simulated cases were presented to each teacher whor was asked to identi- 
fy a child in the classroom who most closely resembled one of the 
descriptions. Finally, the teacher was asked to describe the real 

child's reading problems. The analysis pf classroom versus laboratory 

r 

diagnoses showed that the diagnostic categories most frequently 
mentioned in the laboratory were also mentioned in the classroom. The 
datp. show that simulated cases in reading elicit the use of similar 
criteria as that found natural settings. 

JTable 10 

Mean Individual Diagnostic Agreement 
for Classroom Teacher & 



Case . 



Mean Phi . .04 .03 

Standard Deviation .13 .11 

Mean Porter .05 „ .06 

Standard Deviation .06 .06 



Group-Administered Simulated * Cases 

* • v t 

The purpose of the group-administered simulated case study was 
to test the generality of the findings of previous studies to a legs 
restrictive information collection procedure '(Stratoudakis , No'te^) . 
This case contained the same categories of information asNihe individual 
one but with a smaller variety of measures. Case^ information was 
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presented in a looseleaf . notebook. The critical procedural difference 
was that subjects, working alone but tested in sL group, could interact - 
with a case simultaneously under the supervision of just one experimenter 
and they were free to thumb through the information in any order and 
at any, rate. 

* * 

The subjects in the study were 12 certified classroom teachers who 

had received a top grade in the graduate reading diagnosis course at 

Michigan State University. They examined three simulated cases in 

the individual or group format. Other variations on the procedures of 

the 1977 study included (1) a 30-minute limit on data' collection, (2) 

subject translation of written diagnosis to standardized vocabulary, 

and (3) no debriefing session. Additional, variations that applied. 

only to the group-administered case were that (1) less information 

was available, (2) subjects examined case data without experimenter v 

intervention, and (3) subjects examined the^case i J* a group setting. 

All these variations on the 1977 procedure were designed to reduce 

the cost and complexity of data acquisition. 

The commonality results 'show that once again, most^diagnostic 

categories were mentioned by only One teacher. Mean commonality 

across diagnostic statements was .20 (4.D. = .13). The minimum commbnlity 

was .17 since a total of six diagnoses were obtained for each case. 

The obtained value is only .03 above the minimum. 

The categories mentioned most of fcfm differed 'li<Me f romTthose 

described in all previous studies.. Categories mentioned most often 

included blendfe, sight-word recognition, word analysis, oral reading, 

fluency, and visual discrimination. The findings held for both the 

individual and group -administered- simulated cases. 
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Individual diagnostic agreement remained largely unchanged despite 
the altered format and procedures, as shown in Table. 11. 



4 * . Table 11 J 

> <# 

Mean .Individual Diagnostic Agreement 
for Group Administered Simulated Cases 

Two Simulated Cases Two Simulated Cases 
(Experimenter Controlled) (Experimenter Controlled 
« and Subject Controlled) 

(N=39) f * (n=30) 

Phi > • ^ 

Mean .07 A .10 

S.D. . .06 .06 



J Mean commonality on cues was .34 (S.D. = .20); mean individual agree- 

j 

ment was .17 (S.D. =.16). In general, these last three studies seem to 
indicate that (1) the low-agreement findings were not uniquely charac- 
teristic of the particular clinicians in the 1977 study; (2) the low 
agreement findings do seem to characterize the decision making of other 
education professionals; and (3) the low agreement is probably not an 
artifact of the particular cases and procedures used in the. 1977 study. 

Replication Studies 
In this section we will present the results of- two operational 
replicates (Borg and Oall, 1979) of the 1977 study. These replicate 
studies were aimed at examining the ai^tifacts of experimental proce* 
dure and data analysis that could have accounted for the finding of 
low reliability in the earlier studies. The description for each 
study will include (1) the purposes of the study, (2) the population, 
(3) the design and procedures only as they differed from the first 
observational study, and (4) results. 



Study of Vocabulary Standardization Procedures 

Tfye purposes of the vocabulary study were to (1) provide an 
operational replication of the 1977 gtudy and (2) investigate method- 
ological problems in experimenter translation of natural language diag- 
nostic statements into a standard vocabulary (Hoffmeyer, Note 5). A 
concern was that in categorizing the clinicians 1 natural language 
statements, the experimenters might hav^^ailed to see equivalences. 
In that case, statements that were actually describing the same 
thing would be coded as being dissimilar, and r 4fereement w'o^j&d appear to 
be very low. Conversely, statements actually describing different 
problems might incorrectly be equated. 

The design and simulated cases of the study were identical to 
that of the 1977 study. The eight subjects were senior clinicians who 
have been nominated by university* faculty ; all but two had doctoral 
work in reading. *Each subject diagnosed three simulated cases, the ^ 
first and third being the same case with minor changes (i.e., name, sex, 
etc.) Some procedures differed from the other studies. 

First, participants were giVen a 45-minute time limit for data 
collection ^^reduce the possibility that subjects might become con- 
fused -by an-information overload. Second, subjects were not provided 
with an inventory of available information to address the possibility , 
that the inventory was stimulating subjects to ask for information 
that they would not otherwise request. Third, subjects translated all 
natural language statements in their written diagnoses to a standardized 
checklist. The checklist was empirically derived from clinician 
statements in the preceding observational studies. Fourth, the ^e- 
Briefing session was conducted via a written questionnaire. 

Once again, commonality^syzi^oss cases focused on the same diag- 
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nostic categories as in thf original investigations: (1) average 
Jjjyptellectual potential, (2) poor oral reading, (3) sight words low, 
(4) phonics weak, (5) poor word analysis skills, and (6) problem with 
auditory acuity. There was some agreement' on two additional categories: 
problem with comprehension and poor attitude toward reading. For any 

given case only a few statements could be gleaned that' represented 

■ I 

commonality on case characteristics. Mean commonality, on diagnostic 

J? ' v. 

statements was .24 (s|d.=.13). The minimum mean commonality is .17 

I 



since a total of sixjdiagnoses were obtained" for each case. The ' " c 

' I ' • - ' 

obtained value is o#y .07 above the minimum value.' • * ' 4 

The results f|| interclinician agreement showed a slight increase ' 



over the original jftudy (Table 12) V ~ & 



Table 12 



Hfan Individual Diagnostic Agreement 
for Stvpy of Vocabulary Standardization Procedure 

I'm 



Diagnostic Agreement - 

Interclinician Intraclinician 



Phi/ 

Mean .n .32 

S'.D. .08 . .11 

Porter 

Mean .08 " .21 

• S.D. .05 .07 
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The results for intraclinician agreement Remained essentially 
the same. Therefore, while the original method of standardizing vocab- 
ulary may have produced slight underestimation of the individual agree- 
ment results, the differences are clearly not great enough to implicate 
the translation procedure as the explanation for , the generally low 
reliability. 

Again there was more commonality on which cues to examine than on 
what diagnostic statements to derive from them. Mean commonality on 
information requested was .43 (S.D.=.26). Individual agreement was 
lower: interclinician mean was .33 (S.D.=. 15); intraclinician mean was 
.42 (S.D.=.20). . 

Study of Diagnosis arid Remed/ation 

This study had several purposes (Weinshank, 1982). First, it pro- 
vided an operational replication of the 1977 study. Second, it studied 
the reliability of remediated diagnoses. In all previous studies, the 
analysis was performed on all diagnostic statements without reference 
.to the remediation. In this study, the analysis was altered to addition- 
ally examine the reliability of those diagnostic categories for which 
remedial recommendations were made. The original jneth^d might have inad- 
vertently "swamped" substantial agreement by combining a few reliable, 
important diagnostic categories linked to remediation with a larger number 
of unreliable, unimportant ones that would not be linked with remediation. 
Third,- the study examined the reliability of remedial statements them- . 
selves. Since remediation prescriptions lead to actions, it was con- 
jectured that there would be greater reliability with respect to remedi- 
ation prescriptions chosen. Finally, the relationship between diagnosis 
and remedial plans was examined* ^ 



The design and simulated case materials of the study were essentially 

'* ' * * 

identical to, the 1977 study. The subjects were practicing' reading 
teachers, all of whom held master's degrees 'and had an average of 11 
years of experience each. Four had received their graduate training in 
, Michigan, the other four in Illinois. j 

The major procedural differences between this study and the " 
original one were (1) the use of a standardized, diagnostic checklist * 
for subject translation of diagnostic statements-, (2) the use of a " 
standardized remedial checklist for subject translation of remedial 
statements, (3) development of new debrief ing. procedures used only in 
the final session and focuse'd on reasons for associating or not associ- 
ating diagnostic and remedial statements,* (4) the addition of analysis 
; of the reliability of diagnostic statements linked to remedial plans, 
and associations between specific diagnostic and remedial statements,, 

This study examined five products for each session. Two af these 
were diagnostic and cue performance records examined in* all previous 
study. Further analysis was performed on (1) remedial statements, 
(2) diagnostic statements linked to remediation" prescription, and (3) 
the associations made between specific diagnoses and specific remediation 
prescriptions (diagnostic/remedial associations). 

The mean commonality for four of these products is summarised in 
Table 13. \ s 

The minimum mean commonality is *17 since six diagnoses were ob- 
tained for each -case*, As in the previous studies, there is very little 
differences between the obtained and the minimum mean commonality. 

Ten percent of the diagnostic categories that were mentioned 
accounted for whatever commonali^^^Lsted across cases. Those 
categories were (1) at least average intellectual potential; 



problems with (2) word recognitioft, (3) word analysis, (4) oral 
reading, (5) silent reading, (6) comprehension, (7) auditory /visual 
acuity, (8) auditory discrimination, and (9) affect. * 

N _____ 

Table 13 

Mean Commonality: Study of Diagnosis and Remediation 

•* ■ ■ 

Products' Mean S.D. 
< 

Diagnosis .26 .15 

Remediation .24 .13 k 

Cues 

Remediated Diagnosis .24 .13 

\ 

Similarly, ten percent of the remedial categories mentioned 
accounted for whatever commonality existed across cases. Those 
categories were (1) sight words, (2) phonetic analysis, (3) structural 
analysis, (4) oral reading, (5) visual problems, (6) comprehension, 
and (7) motivation. 

Table 14 shows the results for inter clinician and intraclinician 
agreement. 1 Overall, global diagnostic reliability and cue reliability' 
remained similar to the other studies; commonality on remedial actions 
to be use<^ was also unreliable but individual agreement on remedial 
actions showed slightly more reliability; agreement' on precisely which 
diagnoses warranted treatment was no better than for the global diag- 
nosis, and the relationship between specific diagnoses and specific re- 
mediations was shown to be near zero. 

In general, replication studies seem to show (1) that the finding 

of low reliability can be replicated with other samples from a popul'a- 

tion similar to that of the first study, (2) that the low reliability 

ft 

was nop induced by experimenter error during the standardization of 

".. '' '■• 43 ' 
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the natural language diagnoses, and (3) that remediations and the diag- 
noses linked to remediations were no more reliable than global diagnoses. 



Table 14 

Mean Individual Diagnostic Agreement: 
Study of Diagnosis and Remediation 



t 






Clinician Agreement 


-iPi 








Inter 


Intra 






Phi 


Porter 


Phi 


Porter 




Diagnosis 


.16' 


.11 


.23 


.u / 




Remediation 


.14 


.10 


.29 


.20 




Remediated Diagnosis 


.13 


.08 


.22 


.14 




Remedial/Diagnostic 
Associations 


.18 


.00 


-.10 


.02 




Cues 


* 

.24 


.29 


.31 


.49 



) Measures of Diagnostic Process 

In addition to the results presented for the preceding six studies, 
some of the processes by which the subjects reached their ^decisions have 
been documented. Because of variations in procedures, process statis- 
tics are available only, for the original 1977 study, the learning 
disability study, and the classroom teacher study. a 

First, the data show a broad range of time taken for cue collec- 
tion and examination. About one-fourth of the subjects took 30 minutes 
or less, while another fourth took an hour or more. 

Second, the number of cues requested varied widely. In the three 
studies (N=84), the subjects requested a mean of 33 items from their ^ 
case inventory (range=ai,89) . The number of cues collected ^as only 
moderately related to the length of time the subject took to examine 
them (^.37), A subject who took an hour or more to examine requested 
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information did not necessarily collect many more cues than a subject 
who tookhalf an hour or less. In sum, there appears to be no significant 
correlation between time taken, number of cues collected, and diagnostic 
agreement, ■ ^ : 

Subjeb<s requested cues in order to formulate a diagnosis. During 
this process they were asked to verbalize their thinking. For the 1977 
study, statements of hunches or hypotheses were extracted from these 
verbalizations. Hypc^heses were analyzed with respect to when they were 
initially considered: during the first, second^fhird or fourth quarter 
of the session. The results show that almost half of all hypotheses 
were generated in the first quarter. Hypothesis generation declined 
dramatically from this point to less than 10 percent in the final 
quarter. However, this was not the pattern for cue collection acros§ 
quarters. The number of cues requested remained essentially constant 
across all four quarters. * 
'The mean results for cue and hypothesis performance across cases 
in the three studies are summarized in Figure 1. 
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Figure, 1. Mean results for cue and hypothesis performance 
across cases . in three studies A 
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The figure shows that the clinicians continued to collect informa- 
tion long after most hypotheses liad been generated. It may be that 
the subjects needed additional information in order to confirm or re- 
ject existing hypotheses. Alternatively, they may have continued to 
collect data to increase their confidence in judgments already made. 

The process of cue collection, hypothesis generation, and verifi- 
cation culminate^ in the subjects' writing of diagnoses to contain 
some subset of the hypbtheses considered. In the 1977 study, an average 
of 39 percent of the hypotheses considered were confirmed arfd stated 
in the diagnosis as characterizing the case. That is, of an average 
of 31 hypotheses, 12 were carried over into the ^ritten diagnosis. The 
remaining hypotheses are assumed to have been rejected or simply for- 
gotttoJ 

As a^result of our analyses of process, our general impression 
is that the professionals studied did not have an overall strategy or 
framework for reliably .linking cues with hypotheses and hypotheses 
with diagnoses. « 

Summary of Findings 
There are two related findings across all studies: commonality * 
and individual agreement are both very low. The findings on co/nmonality 
show that most statements in a written diagnosis and remediation for the 

same case are mentioned by only one clinician. Mean commonality for 

% * 

all studies is summarized in Table 15. By far the most frequently 
mentioned categories within and across studies are potential for * 
reading, sight words, word analysis, oral reading, attitude, compre- 
hension, visual discrimination, arfd auditory acuity, 
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Table 15 ^ 
Mean Diagnostic Commonality Across Studies' 



Study Mean S.D. 



Initial Observational. 


.26 


.14 


Learning Disabilities 


.08 


.04 


Classroom Teachers 


.14 


.09 


Vocabulary Standardization 


.24 


.13 


Group Admin. Cases 


.21 


.13 


Diagnosis & Remediation 


.26 


.15 


Grand Mean 


.20 


.11 



A second major finding across all studies is the low individual 

diagnostic agreement. Individual agreement data for all studies is 
summarized in Table 16. 



Table 16 * 



Mean Individual Diagnostic Agreement Across Studies 



7 : ; 

Clinician 



, Study 


Inter 


Inter 


Intra 


Intra 




Phi 


Porter 


Phi 


Porter 


Initial Observation 


. -.10 


.10 • 


.13 


.23 


Learn Disabilities 


-.02 


.03 






Classroom Teachers 


-.04 


. .06 






Vocab. Standard 


.11 


.08 


.32 


•'21 


Group Administered Cases 


.09 . 




.14 




Diagnosis & Remediation 


.15 


.11" " 


.23 


.16 


Grand Mean 


.03 


.08 


.21 


.20 



Low diagnostic and remedial reliability appears to be a robust phenom- 
enon. While the reliability for two clinicians, diagnosing the same 

) 

case is lower than that for a single clinician on a case and its repli- 
cate, both levels \ofpei4ah4^ity are low. The mean acrrfgs all studies 
is' not much betterJfchan chance. 
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Additional analyses performed on cue records show that the" reli- 
ability of choosing information is consistently greater than that for 
diagnosis and remediation, both for commonality (mean=.32) and indi- 
vidual agreement (mean Phi=.l8, mean Porter=. 23) . 

Analysis of the diagnostic and remedial process in three studies 
Wealed wide variability in clinician performance. The total time, 
taken for cue collection and the number of cues that were collected 
varied greatly among participants and did not significantly correlate 
with diagnostic agreement. Two behaviors that did appear to be constant 
across clinicians were (1) hypothesis generation decreased sharply across 
eal* session but (2) the number of cues collected remained constant 
across the entire session. 



Discission 



Reading experts show near universal agreement ot( the importance 
of diagnosis as the basis for the remediation of reading problems. 
Authorities are also in broad agreement on the following major contents 
of a diagnosis though weighting them differently: (1) determination of 
overall potential for reading, (2) performance on specific skills, and 

(3) exploration of causal factors. The literature shows similar 

i 

agreement on the conduct of the diagnosis. Depending on the particular 
case, diagnosis may be conducted (1) by the classroom teacher using 
group tests, (2) by the reading specialist using individualized reading 
diagnosis instruments in the school setting, and (3) b^a group of .■ 
professionals from diverse fields in a clinic setting. 

The empirical studies reported in this paper show that reading 
professionals, as a group, produce aggregate diagnoses that include 
statements about reading potential, strengths and weaknesses in skills,*^ 
and selected causal factors {hearing, vision, and attitude), thus 
conforming to the recommendations of authorities in the field. 

: - v--. . : , : 48 ■ ' . 
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Individually, the diagnoses show significant deviations from the 
recommendations of the experts. First, the diagnoses include a large 
number of one-time-only statements of questionable relevance to remed- 
iation. Second, the diagnoses fail systematically to meptiop the reading 
skills of greatest import to remediation. Third, even when important 
skills are mentioned in the diagnosis, these statements are not 
reliably linked with treatment, 

0ne ^? ssible explanation for these low agreements might be found 
in the use of simulated cases in an experimental environment However, 
the use of real children in a natural setting introduces factors that 
might further decrease' agreement since a ehild f s behavior. and performance 
would be expected to change, thereby introducing unreliability in the 
data base' f or a case, 

The^differential effects of using real and simulated cases has 
been studied in medicine. No differences were found when diagnoses 
are compared for (1) people with real medical problems, and (2X_people 
cbached to simulate the same medical problems (human simulation) , 
Further, in studies cbmparing human simulation of medical problems with 
simulated cases similar in format to those used in our studies, 
differences were found in procedure but not in the final <kagfioses"! 

A second possible explanation for the low diagnostic agreement found 
in our studies lies in the nature of the training th,at reading specialists, 
receive, A comparison of programs in medicine and reading is instructive 
here. Medical training is based mpon CD an organized body of em- 
pirically based knowledge that relates specific remedies to specific 
problems; *(2) systematic techniques governing the collection of cues; 

and (3) perhaps most importantly, the supervised diagnosis, treatment, 

> 

and follow up of thousands of cases. By contrast^ training in reading 
tends to lack all three of these above characteristics, and instead, 

ERJC . r -| . • . .. 4q 
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has (1) non-empirically verif ied\theoretical .concepts, (2) idiosyn- ^ 
cratic cue collection ''techniques, and (3) supervised diagnosis, -remedia- 
tion, and follow-up vtjf^few cases. 

The importance of diagnostic/ reliability in establishing the 
connections between predicfctotrand outcomes requires that action be 
taken to provide the kind of training that will support teacher learning 
in this skill. Results of training studies (Sherman, Weinshank & Brown, 
Note 8; Gil, Polin, Vinsonhaler & VanRoekel, Note 9; Polin, Note 10) 
indicate that inter-clinician reliability on kep diagnostic categories 
(instant word recognition, decoded word recognition, word meanings," 
oral reading, silent reading comprehension, listening comprehension, 
an^ attention/motivation) can be increased substantially. Instruction 
emphasizing external decision aids (including computer support), an 
explicit ^model of the diagnostic process, and practice with feedback 
on a variety of simulated an* real cases appears to provide a power- 
ful common heuristic for data collection and interpretation. 

Perhaps the major implication of the studies concerns the question 
of the place of diagnosis itself in the c^re<gfW'of„ reading problems. 
In the first place, diagnosis as presently conducted should not be con- 
tinued. Should diagnosis be // discarded, then, as a precursor to remedi- 
ation? The answer depends upon whether differentially effective re- 
mediations exist, (i.e>-, *remediations that are more effective for one 
problem than another). If remediations are uniformly effective, there 

is clearly no need for a diagnosis to guide the selection of treatment. 

«' 

However, if differential treatments do exist," 1 then reliable diagnoses 

i ■ •> 

are indispensible. 

As noted in our introduction, differentially effective remedia- 
tions are empirically discernible only when reliable diagnoses are used 
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in the evaluation of their effectiveness.. Given reliable diagnosis, the 

"Si 

stage is set for the aggregation of the data necessary to establish 

{ ' 

differentially effective remediation across childreti and problems. 

Finally, if the results reported in this paper prove the rule, 
we must not castigate reading clinicians; similar results of low agtfee-^ 
ment have been encountered in medicine and psychology. Instead, we 
must seek a better understanding of the causes of low diagnostic -agree- 
ment and methods by which we may use this knowledge to improve the 
training and decision making of heading specialists, and ultimately^ 
better practice. 
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APPENDIX 

Standardized Diagnostic Categories 

1. Nervous, frustrated child 

2. Work appears organized. 

3. Normal interests and behavior • * 

4. Aggressive, impulsive child 

5. Attending behavior needs improvement 

6. No attention span problem 

7. Poor self concept ^ 

8. No risk-taking behavior 

9. Not afraid «to try 

10. No speech problem ^ Y 

11. Speech problem r 

12. Speech and hearing problems related 

13. Suspected learning disability 

14. Health problems in school ^ 

15* Small physical size ^ 

16. No v physical problems 

17. Visual acuity arid farsightedness problem 

18. No. vision problem * 
,19. Possible vision problem 

20. No auditory acuity problems 

21. Auditory acuity problem 

22. Reading problem may be physical 

23. Y Dependent reader 

24. Lack of motivation for reading 

25. Attitude toward reading poor 

26. Normal motor skills* 

27. Visual motor skills adequate 

28. Visual/motor problems 

29. Normal home environment q , 

30. Parent- school cooperation inadequate J 

31. Family values reading 
32; Parents are cooperative 

33. Parents don't like to read 

34. Parent anxiety about school and reading 

35. Parents^ educational background 

36. At least average reading potential 
3 7\.. Good P° tentia l for learning sight words 

38. Low average reading potential 

39. Meaning Vocabulary weak 

40. Not working up to potential 

41. Comparative statements on verbal and performance scores on WISC 

42. Poor liaSjfcning comprehension 

43. Problem with verbal skills 

44. Adequate verbal skills ^ 

45. Vocabulary adequate ' ' 

46. Problem with auditory memory 

47. Auditory memory adequate 

48. Is problem with hearing or with reproduction 

49. Visual memory good 

50. Problem with visual memory 

51. Reading problem 
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52. Not bad in reading 

53. Not performing at grade level 

54. Reading not meaningful act 

55. Not accurate 

56. Operating at first grade level 

57. Word by, word reader 

58. Reading rate slow 

59. Sight reader ^ " . 

60. Poor oral reading 

61. Silent reading score equal to oral reading 

62. Second grade oral reading score 

63. Oral reading makes him look better 

64. No comprehension problem 

65. Reading comprehension inadequate 

66. Good listening comprehension/poor reading comprehension 

67. Second grade reading comprehension skills 

68. Comprehension higher than expected 

69. Reading vocabulary higher than comprehension * 

70. Poor word identification impedes comprehension 

71. Good use of context 

72 \ Inconsistent use of context for word recognition / 

731 Limited use of contextual analysis ' 

74. Most errors surface errors 

75- Did not make deep structure errors 

76. Poor word analysis skills 

77. Guesses at unknown words /*< 

78. No risk taking in word analysis 

79. No independent analysis 

80. Uses word recognition skills when cued 
**81. Phonics weak 

82. Poor letter sound association , * 

83. Some phonetic ability * j> 

84. No problem with isolated letter sound skills * 

85. Isolated (phonics) instruction problem 

86. Problem with vowels 

87. Knows short vowels in isolation 

88. Consonant blends not a problem 

89. Problem with blends 

90. Adequate word recognition and analysis 
91^^T)oes not use ending cues * 

92. Sight words low 

93. Sight vocabulary not autoipatfc 

*94. Visual discrimination not a problem d 

95. Reversal problems A - 

96. Configuration application for word recognition inappropriate 
^7. Fourth grade sight vocabulary 

98. Poor discrimination of redundant letter combinations 

99. Insufficient visual discrimination and word scan 

100. Ignores or confuses middle letters 

101. Inconsistent use of beginning and ending letters and sounds 
102.. Uses beginning letters and sounds^for word identification 

103. Problem with discrimination of visually similar words 

104. Affixes no problem ' 

105. Inadequate structural analysis .skills 

106. ^ Problem with syllables 

107. Understands idea of syllable*? < - 
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108. Auditorially discriminates sounds in words 

109. Auditory discrimination problem 

110. Problem, with auditory stream - 

111. Problem synthesizing parts into wholes. 

112. Can blend short words 

113. Difficulty blending sounds into words 

114. Spellirig/writing problems 

115. Handwriting a problem 

116. Writing problem 

117. Spelling problems 

118. SAT scores show growth 

119. Needs structured instruction 

120. Inconsistent instruction 

121. Inappropriate instructional materials 

122. Late identification of reading problem 

123. Overplacement in school reading materials is a problei 

124. Math near grade level 

125. Problem with numbers 

126. GSA phonics 

127. GSA word chunking 

128. GSA structural analysis 

129. GSA configuration 

130. GSA reversals and deletions 

131. GSA bleftding , 

132. GSA blends ^\ 

133. GSA speed and accuracy 

134. GSA sight words 

135. GSA word recognition and word analysis 

136. GSA words in isolation 

137. GSA discrimination and memory 

138. GSA visual memory 

139. GSA performance in reading, compared to potential 

140. GSA deep structure 

141. GSA potential 

142. GSA verbal and non-verbal performance on WISC 

143. GSA affect 

144. GSA hearing 

145. GSA vision 

146. GSA speech v 

147. GSA perception J 

148. GSA physical problems 

149. GSA health 

150. GSA motor performance 

151. GSA home background 

152. GSA teacher relationship 

153. 1 GSA reading as a school subject problem 

154. GSA grade level of performance 

155. GSA language 

156. GSA comprehension 

157. GSA math 

158. GSA handwriting 

159. GSA spelling 

160. GSA retrieval 

161. Non-specific observations and questions 

162. Unique statements 



4 



58 



